1,839 research outputs found
Chinese Spoken Document Summarization Using Probabilistic Latent Topical Information
[[abstract]]The purpose of extractive summarization is to automatically select a number of indicative sentences, passages, or paragraphs from the original document according to a target summarization ratio and then sequence them to form a concise summary. In the paper, we proposed the use of probabilistic latent topical information for extractive summarization of spoken documents. Various kinds of modeling structures and learning approaches were extensively investigated. In addition, the summarization capabilities were verified by comparison with the conventional vector space model and latent semantic indexing model, as well as the HMM model. The experiments were performed on the Chinese broadcast news collected in Taiwan. Noticeable performance gains were obtained.
Geometric Learning of Hidden Markov Models via a Method of Moments Algorithm
We present a novel algorithm for learning the parameters of hidden Markov
models (HMMs) in a geometric setting where the observations take values in
Riemannian manifolds. In particular, we elevate a recent second-order method of
moments algorithm that incorporates non-consecutive correlations to a more
general setting where observations take place in a Riemannian symmetric space
of non-positive curvature and the observation likelihoods are Riemannian
Gaussians. The resulting algorithm decouples into a Riemannian Gaussian mixture
model estimation algorithm followed by a sequence of convex optimization
procedures. We demonstrate through examples that the learner can result in
significantly improved speed and numerical accuracy compared to existing
learners
A discriminative HMM/N-gram-based retrieval approach for Mandarin spoken documents
In recent years, statistical modeling approaches have steadily gained in popularity in the field of information retrieval. This article presents an HMM/N-gram-based retrieval approach for Mandarin spoken documents. The underlying characteristics and the various structures of this approach were extensively investigated and analyzed. The retrieval capabilities were verified by tests with word- and syllable-level indexing features and comparisons to the conventional vector-space model approach. To further improve the discrimination capabilities of the HMMs, both the expectation-maximization (EM) and minimum classification error (MCE) training algorithms were introduced in training. Fusion of information via indexing word- and syllable-level features was also investigated. The spoken document retrieval experiments were performed on the Topic Detection and Tracking Corpora (TDT-2 and TDT-3). Very encouraging retrieval performance was obtained
- …